DOCOHENT BESOHE 



ED 166 237 

AUTHOR 
TITLE 



INSTITUTION 
REPORT NO 
PUB DATE 
NOTE 

AVAILABLE FROM 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



' 008^226 , 

i 

Echternachtr Gary ^ . 

Alternate Methods of Equating GEE Advanced Te^ts^ 

Project Report PR 71-17 (October 1971). ORE Board ^ 

Professional Report GREB Nd^ 69-2P^ 

Educational Testing Service, Princeton , N.J. 

GBEB-69-2P 

Jun 74 

90p. ' 

Graduate Record Examinations, Educational Testing 
Service, Princeton, New Jersey .08541 (free while 
supplies last) ^, 

MF-^$0v83 HC-$4.6'7 Plus Postage. 

Achievement Tests; Aptitude Tests; *College Entrance 
Examinations; *CoTnparative Statistics-; *Equated 
Scores; *Goodness of Fit; Graduate Study; Higher 
Education ' . 

♦Graduate Record Examinations v 



ABSTRACT 

When two different forms of a particular test ^re 
given to different gjroups of candidates, it is often necessary to 
make the test results for the two tests ox groups as comparable as 
possible; the statistical process used for this purpose is termed 
equating. Two different methods of equating Graduate Record 
Examinations (GRE) Advanced Tests were compared* One method used data 
^rom a group of items, that were- common to both tests, while the other 
method used data from thfe GRE verbal aptitude test and quantitative 
aptitude test, ilso taken by both , groups of candidates.. The results 
of /158 equatings for the 17 GR£ Advanced Tests were tabulated and 
presented graphically.^ Out of the 13 different test series, nine 
different series had equating differences at one end or the other of 
the score distribution equal to about one half a standard deviation, 
and about three percent of the 158 equatinqs had differences of over 
one standard deviation. (Author/CTM) 



i 



Reproductions supplied by EDRS are the best^ that can be made * * 
* from the oriqinal document. * 



EKLC 




ALTERNATE METHODS OF EQUATING 
ORE ADVANCED TESTS 



i't t-;MiSS(ON TO [U Pl^niiurE thi^ 



Gary Kch te rnacht j5^,iil__7fVlVl ^l^l ^><^tn( 



, , I UK t 1 n i( M 1 |{ iNAi \<i soUHce. 
iNf (lijrjAi in^ cf ^iii 'Huo ANI 



GRE Board Prufessionai Report GREB No. 69-2P 



Project Report PR 71-1 7 
(October 1 971 ) 



June 1974 



This report presents the findings of a 
research project funded by a;id carried 
out under the auspices of the Graduate 
Record Examinations Board. 




NC SERVICE, PRINCETON, NEW JERSEY □ BERKELEY, CALIFORNIA D EVANS 



ALTERNATE METHODS OF EQUATING 
GRE ADVANCED TESTS 



Gary Echternacht 



GRE Board Professional Report GREB No. 69-2P 



?roj^ct Report PR 71-17 
(O^ober 1971) 



June 1974 



Copyright (c) 1974 by Educational Testing Service. All rights reserved 



■ ^ ' Aa;[T-.;KNA'i'i': MsTuods {j.p icquatini; advanced tests'^ 

When (iirferent Vorms of a particular test are given eitlrjer concurrently 
of at diLTorent adminiE^trations , it is' oft^en required that the. test recults be 
made eomparable.- For instance, at ETS, |cores from the ^:;RE Advanced Biology 
TwSt used in the October I967 adininrstration were m^de corfparable to the form 
used in the January 196^:) administration. Thus, the, integrity of the test ^a^n 
be protected by using different forms while, at the same., time, test scores are 
comparable and on the same score scale though they come from different forms 
9f the sajne tf^st. .This process of making tests comparable is termed . equating ^ 
and is carried out whenever a new test form is introduced in a testing program. 

One basic requirement for the.typ.e of equating traditionally used at OTG, 
appropriately termed common-item' t^quating, is the existence of a number of 
test items common to both -the new test form and the old test, form to which 'the 
new test is equated. These common items serve as a basis for estimating hovr 
each group would have performed on the test taken by the other which, in turn, 
is used to convert the scores on the new test form to the score ^cale used by 
the old' test form. Although the number of common items necessary for effective 
equating is substantial, it is usually rather small when compared to the total 



tes^ length. 

lo-^rliT^s prinf, of -6 , most oT the examination committees for the . iR^; 
AavancL'd '::-::t:\ ';xpressed an interest in ^ieriving one or more subscores from the 
various r,i^:0,:\ I'oy which /they were responsible. The feasibility of providing 
:nh:h subs'^.)r^■s wa:-' cons iri.ered and a number of questions v/ere raised, aiaong 



This stu'ly has had a ratiier dynaiiiic history having been conceived by ^ary 
Lu'.::", witli tdu? equa}:ings r^ing fiupervi:->.'d by ousan pord. The author inherited 
tile jjroject at the - time oi' reorganization within the compar.y and diil no work or 
thr; .roj^-'ct otlier than wrlt( this r^'port. jit' is solely rr-sponsible for in-; 
contents . 



tho.MC beiiui; th- (•]uaLl:i:: of oubscoro^'^ It' the. traditional cornnioa-iten], ec^uat-- 



would be so large that proportionally few new itornrj v/ould result , in the test-// 



he used,^ the number of common item:-i required for e'quawif-C oubscores y 




form. As an alternative tc coinmon-item equ^^ting, equating ^.hrough 'the Verbal 

and .nicUiMtative c^jores from the Aptitudo Tes/. wa:-' suggested. 
♦ < • 

** ■ 

Stahemont of tht) -Problem 

In order to :rbudy some of the difficulties in equating ■ subscor/s*^ using 

Verba], and c:,uan titabive test score:; ^ a study was undertaken to an/v/'.r the 

:n;r-.; Men: jiow does equating the total score of the CRp; Advanc^f// Tea f^i i^sinp" 

/' \ 

the Ver'eal and quantitative test acorov. from the ^IRE Aptitude 7es":. compare with 
the tr-aditp-onal common'-item equating fbr these same tests; r;iore specifie':^!^.,/ 
arv- tii' -/'o practical differences between the two equating methods from the 
stati(i[)oint of reported scores? Is the relationship between the two equating 
method:-: constant for all Advanced Pests or is Verbal and Quarvtitatiw equating 
more snitabl^- for some tests than others? Are there differenc^js between the" 
two L'y.-thodr. o\AeT' various administration months? Are there differe^^ - - ' r 
:qU'i. ;:ine:s a'^i'oss various^ educational levels? 

T'. was h;, . o'-.h'.sizod that equati^ig tnrqugh ti:e Verbal anc :^uaMti t :^-.-e 
■ ' v.'cnil.i ei:-:ilar' to ccjinrnon-i^.em >(; iuat,ing esj-ie*.^ rally wYiev. t:: 



1 p '1 



10 n 



b' • ^ve-.-r; th- .^Yjd:.-!- tiide "i''-st score;^ and j-n ,'vdvaneou. Test score was hi^'h, I'c'r . 
•e-'u-rl^- . th'-. v.-.-bal and J.uan titat.ivo ejrating slic)uld prove approxima^.el ' tb.e 
sar:ie a:^ ■^oinrr.e-ii-i t"-m '.'equating for the Aevaneeq 'Pests in l-ieonomics . oocic.:..o.;7 . 
Philo:c phy. ai.'i HLoiogy as theso tests : jrr^^late highest vdth the Vert--: id 
.<uan tid.aeiv" .•-.•or-e. vdiilc^, on the other b.aii'i. Tq-anieh, [•'rc^rKdi^ and Physies 
Ai.\Aair'' ^ 1 i'>e:t:: eorr-.date lowest v/ith the Aptitude /est score's: tl.us . l-::s. 



EKLC 



_ similariV, was hypqthesized.. Fnrfher , when a test was equated 'to itself, ' 
.using Verbal and .quantitative equating, the parajiieters should- be approximately 

. one aad zero." The degree to which this is not true reflects the error in the 
equating pargineters, • ■ 

One problem that occurs when using Verbal ar.d Quantitative Tes^.s scores 
fo^. equating is that/<)f different levels of candidate preparation. As ar> 
example, consider two' examinees , one seeking admission to graduate school for 
the first 'iime, the other having completed a Masters program seeking entry 
into a doctpral prograin'. These two candidates are likely to .-^core very (ii.f- 
ferently pn an Advanced iTest although their Aptitude Test scores are, the same. 

. This fact lowers '^the correlation between the Advanced Test and the Altitude 
Test scores veakening the strength of the equating. 

The Sample ' 

All candidates vho took one of 17 GRE Advanced Tests .between Cctober I967 
and September I968 inclusive and who ware registered a^s regular national candi- 
dates, candidates for special administration, National Science Fo^andation candi 
dates, or Oak Ri/ge Institute of Nuclear Science candidates were selected. A 
fiirther c6nstraint on the sample was that each candidate for further study had 
Aptitude Te'st scores ..'arned no more than three months -prior to the AdVginced 
Test score. 

Multiple scores for either the Advanced Tests or the Aptitude Tests were 
treated as follows: In Vne event of multiple Aptitude Test scores the Aptitude 



"The equatin^-^ parameters are of the form Y - A + BX, where Y is the old 
form scale and X ^>he new form scale. V/hen we -say the parameters should be one 
and zero v/hen ; a t(,'s"t i.- equated to itself V that is A = 0) B 1, v/e mean Y X. 
the two score scales sliould be the same. 



'Te.'^t score nearest tnt.^ first- Advanced Test' score 'was taken. Multiple Advanced 
test s*cores couid not be identified since Advanced 'l.v^st scores were sampled 
rather than the candidates themselves, 

Since both old and new equating forms .had- to appear during the period 
under study, ojily candidates wJ:^o took these forms were selected. . This action 
resulted some ^ca^P^ates takihg forijus in Economics, Political Science^ and 
Span1??5h not being , selected., The totp,l c^^tained sample size was 85,111 for Y{ 
Advanced Tests.' The "Advanced Test with the, highest • vol\me was Education with 
f'v^^lb can^idatep selected while the Advanced Test in Geology recorded the 
lowest voluire- with y6l candidates. Between four and six test datoo were 
considered for each Advanced T^t. 

Methodology . ^ ^ ) 

The genealogical charts, Appendix 1/ for the Advanced Tests were u^ed to 
determine the foniis to be equated usin^ Verbal and Quantitative equating from 
the totality of te>^t forms given between October and September of the test 
year, ^^ho rule.v/as to dyplic:ate any past common-item equating with Verbal 
^and Quantitative Ar.'titude score equating. Thus ^ for^, example ^ in economics 
the form was c-iuated to forms through Verbal and Quantitative Aptitude scores 
oinc^ the traliliunal common-item C'-iua/ing- v/as accomplished by equating these - 
oame forms in l*''^-. ^ . ^ 



5 

When OH',' ^^-r/, form is equated to two other test forms through common 

items, thr- resuQ^t:; are two equating lines of the f^rm A^+ B -X and A,^ + B .X 

) 1 1 ^ 

for convertins^, raw scores to scaled scores. To obtain one operational 

conversion line, the angle between these two conversion lines is bisected 

and that birjector {;ecomes the conversion line for ;re reporting. 



V 



II-.' ;;iiuul.l b(! (.>:ni,ha:;iz(-U h.re that the iiajripLi- ui:va\ for Verbal nivi 
.quantitative. (Mjuatint^ wa:; bho ' sample ■a...a(iribod j)reViou,sly, while the ertnunon- / 
4t.nn c,iuati,u^;5 muni a:; eoriparisun,; w.,t(j the original iHiuatiriK:; t;arr;i,e,| ,n,t/ 
.'It vru-iotis tim<.:; in the^pa.'it. Tlius , th(> <>(%ipari;;oni; ol" conmon-lt.eni equatinr, ^ 
with Verbal and .iuaiit'.itative e.Miuatiny W(T(> valid only to th./ Mxt<;nt or th<' 
;:tal)ilij-,y ov eoimnon-item Viuatin;^. ffom s'ajnplc- to sample-. In order {o inv.ruti- 
f'late the ;;tability iiue;;tion lor Verbal arid tiuantitat.ive eMinatin^i to r>o:n>: • 
ext.'iit. whenever a parbieuL'if form war; [riyon more thcWi -once <iuring th.^^y(.ar '.- 
imdcr r.taidy, \-.hor~v te.-,t:; wre eciuat^.'d to oach other.' ' ^ ' ' ' 

The proqeMhire for ...mablnt; luurif.^ Verbal and tjuanfi tatlve< Aptitude aeorer: ' ^' 
i:; clrneribed „ -oinple f. ■ L,v in A|.]K-n<lix , - and i.M .r.imilar t<^ the tr^ditionad eoimnon- 
iten, nppi-oach. <^x-n. rrtlly speaking, thoujdi, the j.roceas of Verbal and .i^antita- 
tive e<Luatintj {^oer, a:; followi-, . Por both hhe new /f-orm and the olia :aorm to which 
th,' new form i^^ beinc o.piated, eonceptualizo two rejr.ression planes for predietinr, 
Advanee-d I'es L :;,eore':; from Verbal an.iVin titative Aptitude Test ;;coren , one for 
th.' frf,oup of <e-:;iinln(K.,-; takin;^ the- new form only, the other for the f,;rou.p taking 

"''^ '""^ oid' f<u-m.--. We a:;:uune thes<- regression planes- to be id(mtical in 
t.ii.-ir interc.;pL;-i. .;lope.s. an rror;; oV estimate. - From these assumptions, 
-piaS.unis for ■ timatirK^ tdie im-an seore and variance ki d.he new form for the 
fe>l-,-il -roup (new f(vr-rn evnjninee.s and oil form p-xaminees) are develojied. 
ifmilarly for fJi,,- <.d,d form. fhir. .;rune . procedure is carried out and e;; timates, 
for t-,h-: total f.;rpuj. a.re' oblained I'or tia.- old form mean and variance. fhese 
two distribution;- arr- bandardl'/., ^1 and ;-,..d-, ecpial to each other aftc:r which 
th.- new i-orm raw score:;' n,r.' .dvi-n a,; a func f ion of the old form raw scares 
and til.: epuatinp is .-ofr.pbd,,. i Por raw .c-ores. These .-.piafed raw sc ore-', ar, 

\ 

/ 

-. ' < ' - 

■ '■ if') *'* 

" a - ■ 



then corive/rtod to tMinicd tscurik uiiing tin.? old. form :.>calc}d convor.slorr jjarainetcjr:^ 
and the equating' complete. ' • * 

. In. order to' make comparisons of the ''Vc?rbal and (quantitative i^cori^ equatin^^ 
and common-item equating, the equating linc^^ Vor both methodij w^re i^j-aphed 
for obtainin^:^ nealed, scores Trom raw ncoron., There were separate graphs f'or^ 

I 

eaeh Advanced Ter, t and each particular equdting using common, old forms within 

^ach Advanced Test. The Advanced Tests were then classificHl i-nto one of three . 

eatogoricG dejiending on the difference at the extreme raw ^orcij between the* 

two methods of equating under study. ' Those were • 

Class I. No extreme differences of greater than 'jO scor<:^. \ ^^^^ 

points at oit?ieI" extreme 

• .Cj-ass '11. A difference of more tlian 'A) points at Only one 

■ extrerue 

Clar.s III. A difference of more tlmn ^0 score points at both 

(^xtremQs. ^ ^ ' * 

Tt was assumed Advanced Tests falling into Class I would be most 

ajnenable to equating through Verba] and quantitative Aptitude Test scores 

v/hile the other test v/ould bo less favorable for that method. j^^or Advanced 

* Tests fallir^f^, in Class III justifying the use of Verbal and Q^uantitative . 
eciuating's would be particyiarly difficult. ' ^ , 

The 'sample^ was furthi^r partiti^ed l)y educational level for each ^ 
Advanced Test. ■ The educational levels: were indicated by every candidate 
at tlie time he took 1:1!'^ test and are: not now in college sophomore^ junior, 
senior, first and ser^ond year graduate students^. Kquatings using Verbal and 
W.uantita^'.ive Test scores were t^ be completed for every Advanced Test and 
''*ve ry educ at ional If.- v( > J. . 



Kr_.;u-l.t;; ^ .. • ■ ■ • 

A Uila.l (>[* ^^(\\in\.ln(/,i\ wcrw. a(.;(«pmpj.l;;in:fl for J Y^TTirn^rf mi t, A-ivan-'r-.i 
»^"*['u:;t,:;. '['hr rM^iaUonr. {'(n* ccjnvo r Lirifv^, raw imiov^h) on the ru!W fork", i.o tJir- 
:;raJ.M(l. ;w'or<v; We ro " (,al)uJ a(' od. and /'ir-aphr.l, tJic trj-u^ :i\)]n^!ir i i. ApfH^n.jiv 
^ Kor ^;a(:li .:6uv.-r.';i(jn o(|ual.ion obLairuMi. iKiinp; v/rt^a.l ari'i wuant.iLnJ.rv^: r'^uatihfr, 
foiir ;;corcn vtav.^ obt^airuAl^ l.ho;.;i' i^cin^ the :;cal(Mi i;f:or(j wh^^n a ..:an.iidat* • 

an.;w-r..; fKj^itt^i:: cun-rrcLly, , rnv ;:(M3r-(.« z^^ro; th(.^ :;(:a'i'-'i :;c'(jr- ^wlu-n t,lic 

, .J * . 

.•aiuiidaU- .-ui.'-.w.j r:; cvry item iHjr,r'(.'rtl,y ; t,lu;; .r.calrd ;;(:or<' ror res poiidinf.:, t.o 
t.lw Itwi'.-.t; raw :;<-urc round, in the o'liiatiiu/ :.;arnplc ; aird tdic ;;caI(Mi ■.\siur>} 
^:orrc:;|)ori(ling to the hi^rhc:;(; raw 'r,coru round in tluj (!(iuatinK .';fiiri]J>% 'VlurMi 
Ui\t two r;,i:orN-^ wore included, in an atd.f^mpi. to mak(- Uu^ (!ampari.';on;; inun^ 

■4 

valiti in a "praoticarl" ruanw,', lor' (jxa/nplf-, no one obtain;; the hi^^hord. 
tht>oretical score Tor most advarica^j te.rts;. therefore, th(^ obtained e>d-.rem(.^ 
lu^ore might providf^ a tetter location tor obtaining grc^ator in:iif-^ht, nr. \\v 
thV^ .practical dilTercncen botwf v.ai /the uethodo. 

Kquivalent r;cale :^(joren wcre^ Vbtained using the oommon^tem oonvernioiir; 
lor' tdie ^;arne raw :j;eore.s, Valiu.^; obtaitK.id from the VoTt^al and vMant.itative 
-ronvcrr.ion^^ w^-r- then :nibtract(ifl rroiri the values obt^ainod from tluj coniniou- 
iti^m conver.;iari.; and tabnlatui in Tabl ^ 1, q^lic differeacNin obtaiti^Mi at zaro 
r'^'.;.-ore :uvi 'it tb^ niaxiniuin raw ^(^ore were tcrmod por,nible' seore dif renau^e';: , 
whi.^' th(^ romainin/, two di f f'e^renc(M; v|erf^ for ob^^erved score;;., ^ The r.ubr-c rintr, 
rt.'pre.;ent tiie niunber of Hdie adjriiui:; tration morddu ih^in^\ thc-^aia3.^k:rii'ation 
.;.y;:tem previoiu:ly dec-ribed on the po.^ ;iidj.; :;ealed ;*<>ore differences, riv^-^ 
Aivanc.'d 'LVc^t..:; faJ.l into dasc I, liavl'w; dd f fr> reiir:,>s j^^-^^ than bO ncal.;d 
;;core poijd>;;. dn: T.;:d.;; < d.a;; si f 1' hI in bias:; i: were I'ducation , Hlctoj-v. 



A'lvaiK'C'l- join lli'' jjT' 'Vioun I.y fm^it^irMuMl in I'lr';;!, c 1 a:; 1 rirut.ion , 

'ITi^' (iJ rf'T'TM'-' In :t<hi:i\/i:>'' wouM hhat; , w<Tf hh<' V' r'l)a,l and .,^nard. 1 f.nd.i • 
"quaMtu-'; iK''"d, cyn.mijtt.'O:', wtmld obtain hi^hrr :'.(:a.l<-d i-.r-or^*:; at; tii^ Iwiir end 
uV \Au' r'anK»" ^^haxi Uu^y would had cMirmiun-it^Mn •■'juatJjir bc^'U u:'.<.'(i, 

d.lf'f*< ■ r^'-rif-' w^Tc f^i(j],(>Ky » t^homi;^ l^ry , lv/{)nomi^';' , I'ln^^iin^M.-rinK , Mabhf ■ni/itir*:; , 
.•Mid I-'::;y':iio|j)r,.v. 'I'ii^^:;'' aJJ. v/fT" ^diarac .■ rlZ' )d i;y <liff''.'rotu-^>;;- of' rnor'^ 

t,h'U' '^.^ :^"il"d. I'.cfjvf point;; at Mu.* to}) ^-nd of tlie pu:;:iit:)Ip r,eai<,>d :w.:orc 
r-atu':''. ^ " » • 

A:v-an'^"d 'F^-r.^;: in l-'f^^^ti'du (IfK^loKy? Philo:.;ophy » Ptiy:^i('fT, and :UjciolQgy 
W'Tf* •la:;-iri'Ml in (:ia;:->r3 ri"^ (jn tho posrjibie 3(.'aled iu:orv. (•rit<-)rion with 

■oni.v i'r-'n di -drinf.inr: '•J.a.;.';iri('.'itiun on th*' obsorvod r.raLfMi .score criterion, 
A oomt i/'hr' ••.l.a:;:'i CioatioTi Tor/ both observed and ])o:^.sib'l(j r^eale'd ^;r:ore' tii.p- 
'ren- '".: ai ■[ 'e-ar-:; in Tabl^' - , ^ ■ . 

i ' " "\ 

L' int''re.:t,iiifr *-vent did oeeur whe>n tj^ie Verbal and ^'.uantitative (M^uat- 



in^^r. w-r--.', ^-omfiat'^'d wibh Idi'; i^omnion-i tein '^'di^alsinBr. in a:Vay other than 
uv'!v.:\u'ini[ Ihf endptjint di (d'^M*onees oV e^aeh oWH'p.rnu Most of tdie fcmunou- 

it'-'-' '■■eiatdnr.: LriV(;I\red two old, fomr; a.s the g,f*h.e^Togieal ehart^r^ ii^dieate. 

^ - ■ : ,■, ' ^-^ 

If ^dl' V* ri''il an i i;,uantihative r»(piabiru^, throu^,h two olri foriifep^^ir performed 

■ > i 

a.: ;'or MJi;uno!\-ihe-:r. 'M[4iatinK> tliat i:; , bisc^'/ tin,'/. t\v cwo obtained ^'tinatinp. 



iiwi: and u:;i<iK th(.' biriActor tor :.;oo.ro r^^porting, the results' change a 'little 

nr. '.Icmoriiit/rvit^Mi in JViaK'ndix 'j. '['ho r^ooro differences at. the extremes .for 

Advanc^Mi in Biology, Isn'^^imMirinH Form C^, Prenc^a/ ' Geology , Music, ■ 

I'or folA)ir^/ yovm and I^:iycholof;y are , each les;:^ than ^(^poirits. It is also 

iu,.i..'Wort;.iiy t,hat: whr-n A'ivanced Tes ts^^-lflr^^^re equated ^tg ■ only one form, ppor 

^ '■' ' <• ' ' 

aK,r( '(.'merit bfjtWe(3ri equating methods wa.-^ -found. :■. 

V^iu'n a f,<'sh form v/as nr.ed more than o^ce during the testing Vear under 
study, these t<^-.1,:; were ecluated to themselves. Differences between the" 
V'-r-Ual and '.^,uan t.i tative eciuatin^.^s and>the cormOn-item equatings were calcu- 
lated and tabulated. These .differences* provided a rough estimate of hoyr 
V^^rbal and wuan tita^4^^^* e([uatings varied frQtu one equating sample to another, 
Unfe>rd:unately , there^ are no comparable figures a^ail^ble for common-item' 
eciuating. ^Phe results appear in ^'able' 5, 

fht^ rt^sults or thes(? calculated differences were mixed. In 'considering 
the.^ lirferfences over' all Advanced' Tests , the process, of Verbal and Quantita- 
tive e^quating seem^j to be unstable as .6 of ' the 25 equatings; resulted in ' 
scDrt^ dlTie'?(Mi<'t3S of [A') [.oints or more roughly aD^ounting to about 25 per cent 
oi' the equatings. On the other hand, when the equatings were taken by ^ 
individual, Advan^^-ed IV'sts, the niunber of equatings performed was insufficient 
Tor di'avring any meanin,'^rt;l conclusions. ■ • , v' ' 

iMH- idio:;,^ i]^mc Verbal and ^.quantitative equatings correlations both 
firs.t (n^ie'^ and multi])l.j correlations v/ere calculated .for both^the group 
riiaKinr, u]^ th- ol-i ami now form (equating saiTiples. These correlations were 
e(C,vA-en tne :orni and Verbo.l Ajititudr, Idic form and '.Quantitative ApUtude, 
V^'rr-)al and quantitative A[)titude and tlie multiple correlation of the form 



\ 



12 



-10- • V , 

with\the two S^titude ue'st 'Scores. These correlation^ tend to remain stable 
from old to new form with the exception of the correlation between the form 
an,d Verbal in the cases of Chemistry and Mathematics, between Verbal and 



s^iuantitative for Spanish; and the multiple correlations for Mathematics . 
and Spanish. 'These results appear m Table 4. . 

The saiiiple- was partitioned by educational level for each Advanced Test. ^ 

^ ' r ' ' , . • • ' 

< ■ ^ \ , ■. 

Counts for 'each, educational level of every Advanced Test were obtained and . ' 

■I ' ■' . 

based on these counts and cost factors; no equatings were performed by edu- 
cational level. The counts ' showed that most everyone who took Advanced 
Tests were seniors and that equatings for ^ the, other educational levels were 
prohibitive based > on the small, sample numbers. 

Discussion and Comments 

\ ■ • . • . 

The question now arises of whether the' study accomplished th-e^, objective 
it set up. Clearly, some practical differences were found between common - 
item and Verbal .and Quantitative equating ^methods in terms of the 50-point 
classification scheme. One difficulty in interpreting these differences 
comes about when the samples used for' equating are considered. ^"^^^^ 
different samples were used for 'each ..equating one could logically suspect 
these differences. ' The question of comparing the two typ^ of equating^;;lines 
using id'entical samples cannot be answered. Common-item equatings coi^'respond-- . 
ing to Verbal and Quantitative equatings could have been performed using the 

sampled had there been funds for rescorin^- all answer sheets and l')8 
additional equatings. ■ ■ 

Another question ai^ising ±n the interpretatior of the equating line 
differences Was the significance of the differences obtained. i'Mi't.\ points 



13 



-11- ' ■ ' - 



.was the criterion for significance in this .study but was that too mush or 
was that enough? No, probability statements can be made concerning statistical 
significance, and one is forced to use "careful human judgment." Since nothing 
is known of hgw ssunpling differences affect common-item equating and only very 
limited evidence is available for Verbal and Qu^titative equating, no 
stati^jtical test can be made. 

The differences obtained were assessed at the endpoints of the possible 
score ranges. One might question the need for difference r to be calculated 
here* For example, which end of the score scale is mT>^ damaged by a lack 
of agreement between equating methods? It might be that the need to 
■i dxfferentiate ajnong candidates ^^coring at the highest end of the scale i/ 
not necessary thus allowing a relaxation of the 50-point score difference! 
at the high end* Al^oj, one might reason, no one scores at the highest I 
i^ossiblt,' score anyt^^ay and no one cares whether that score is or 
10' »0 m most. lection or diagnostic cases. Therefore, one might question 
using the possible endpoints as difference , criteria and su,^,v;est some other 
less conservative points for assessing practical differences. 

V/as tipe relationshi]^ between t:he two equating methods constant for all 
te:;,ts? This wu conclude was not the ease.. Had the relationship^ been constant/ 
we would liavc expected all fdu.^ Advancnd Tests to fall in the same classification 
Also, a look through Appendix -j will illustrate the variability of the Verbal ' 
and .,).uantitativ(: equating linrs, witli r^^sjuuvb to the common-item equating line. 
(M^'arl.y , , tht^ V-udjal :uM > >uan t.i bative equating is more saiibable for those 
Advan.vvi Tesl.s railing in Class 1 tlian tliose falling in Class III witla respect 
t,o agrpt.'iiuMit v/ith rcmmi(.)ii-il.< 'in fquatw-Ui";. 



. ■ ■ . ■ -u- ■ ' 

The leist two questions^ differences across various administrations and 

• V. ^ ■ 

r.; educational levels^ were not answered at all. In order to answer the first 

/' > 

question;^ all forms equated through Verbal and Qiiantitative scores would • 
have had to be equated using common items. The second question could not be 
answered due to the relatively small sample sizes obtained for the various 
educational levels. 

The main difficulty this study encountered involved, the lack of 
knowledge of the properties of the commg>n-item equating method. For 
example^ consider the comparison of the operational common-item equating 
line with a Verbal and Quantitative equating line. Where do we want to 
evaluate their differences? What first blocks our progress is our not 
Imowing how the comiiion-item equating line varies from sample to sample. 
Comrarisons between the two methods must be con^iaered in light of the 

( • 

Sampling variations of each method. Ths proble^m of sampling variation 
cannot he easily solved mathematically as^the estimates of the slope and 
intercept of the equating line involves the ratio of two other estimates. 
The ^answer coiild be found in computer simulation of equatings. If many 
equatings were simulated under various conditions on. the means ^ variances^ 
and correlations between the anchor and the test^ estimates of the equating 
line .variation can be obtained and confidence bands drawn and comparis^^ns 
made more easily. 

Another area of cojicern should, be that of the robustness of the equating 
procedure against violations in the three basic assumptiohs. By assessing ^ 
the degree to which violations in the assumptions affect the equating outcome, 
• the total variation in the equating procedure can oe partitioned into two 



15 



ERIC 



-13- ■ • ^ 

/ 



1 ■ I 

parts/ one due to the lack of compliance with the' assumptions, th^ other. ' 

due to sajnpling variation. In pra^ice one can do nothing about/the second 

component, but one can select sajnples for equating wher& the as^djnptions are 

^ / 

..most likely to hold, / 

,( 

It is recoiTimended that studies be undertaken to estinJtg'' the variability 

^of the common-item equating line and its roboistness against /Carious violations 

/ 

in assumptions. Having accomplished that task, investigatij()ns of other 
methods of equating could be undertaken with meaningful comparisons arising. 

■ ' / 

/ 

I 

/ 

/ 



i6- 



\ 



Test and 
Equating Forms 

Biology ' • 

P ' — N 
10 2 

P — N 
10 4 



M 



12 



0. 



1 2 
Chemis trv 



5 



P — N 
10 .12 



10 1 
°1 4 



Pi 



12 



^ -- ^1 
^12 h 



^12 4 



-1'+- 



Table 1 



Common- I tern Scaled Score Minus 
Verbal and Quantitative Scaled Scores 
at the Extreme Ends of the Scale 



Economics 



Possible Score Differences 



Hghst. Poss . 
Scaled Scores 

24 

41 

50 

34 



46 
8 

125 
37 

- 80 
-126 

- 3 

- 8 



Lwst. Poss. 
Scaled Scores 

- 5 
-12 

- "6 
19 

13 



- 2 

- 1 
10 

5 
4 
11 
22 
8 



Observed Score Differences 
Hghst. Poss. Lwst. Poss. 
Scaled Scores Scaled Scores 

18 



30 
43 
33 
50 

36 
4 
92 
40 
-55 
-85 
2 
69 



- 1 

- 7 

- 6 
19 
13 

1 

- 2 

- 1 

12 

15 N 

4 
11 

5 
11 





- 


20 


8 


21 


7 




- 


- 4 


20 


1 


18 


°1 


- N^, 


54 


-23 


8 


-23 


°1 




79 


-28 


72 


-28 



I 

/ 



17 



ERIC 



Tab le 1 Cont' d. 

Test and 
Equating Forms 

Education 



Fren ch 



10 



12 



M 



10 



P,ossible Score Differences 
gghst, Posa, Lwst. Poss. 



Scaled Scor 



^2 


^o 


- 21 




- h 


1 






- 3 




- 0^ 


- 28 


\ 


- 


- 7 


Engineering 


- 




"^12 . 


- 78 




- 


22 


^1 




- 8 






81 


^ • 


- 


- 66 , 




\ 


2^X 


^7 ■ 







12 



57 
11 
54 
12 
16 
H 
12 
24 



Scaled Scores 
20 
15 
17 

- ? 
4 

20 
10 
14 
9 
20 

- 9 

- 7 



27 
- 4 
61 
3 

-22 
-42 
20 
■37 



Observed Score Differences 
Hghst. Poss. Lwst. Poss. 

Scaled Scores , Scaled Scores 

-15 



0 

- 1 
-21 

- 2 

-50 
21 
0 
61 

-35 
18 

-44 

-37 

- 6 
-30 

- 9 
-13 

13 

- 5 
12 



9 
6 
lA 
-10 
2- 

20 
10 
.10 
5 

-13^ 
-11 

21 

- 4 
50 
1 
19 

-33 
15 

-31 



is 



Table 1 Cont'd 



Test and 


Possible Score Differeijces 


UDserveu o cu xt; 


X'X J. X- L c LI Ceo 


Equating Forms 


Hchst. Poss.' 


Lwst. TPoss. 


Hghs t. Poss . ' 


Lwst. Poss. 


Geology 

V 12 


Scaled Scores 


Scaled Scores 


0 caxeu ocores 






- 24 


-12 


. -19 


-11 » 


p 

-12 


^2 ■ 


1 


3 


2 


3 


p 




-67 1 


-53 - 


-62 


-53 


p ■ 




29 


-35 


5 




n L VJ 1. jr 


: \ 


1 


• 








M 

12 


1 

• "7 

- 7 1 


- 3 


- 7 


- 1 






- 26 




-14 






^2. 


, 30 


> 

10 


* 24 


1 t; 

X J 








- 1 




n 


\ - 


^ 


- 30 




— X 4 


X D 


% - 


M 

12 


- 37 


0 


-23 


1 




M 

12 


- 4 


-18 

1 
1 


- 7 


-15 

J 




^10 


- 21 


! 10 

1 

1 


-15 


13 / 




.^2 


32 




24 


- 2 


Literature 




\ 








^10 


4 


\ 10 


1 


13 






11 




7 






°12 


- 43 


9 


J / 


7 




°4 


- 34 


32 


-26 


30 




^10 


- 7 


^—10 


-10 


- 6 


O4. -- 


'2 


- 2 


- 2 


- 5 


3 


^7 


°12 , 


- 45 


- 3 


-37 


- 3 


'7 - 


"4 


- 33 


-22 


-23 


-22 



19 



Test and 
Equating Forms 

Mathematics 



10 



12 
Musi c 



— P 



10 



~ P. 



0 



12 



— M 



12 



12 



10 



~ M_ 



10 



~ M 



1 

Philos ophy 

P — J 
12 10 



P — M 
•12 1 



12 2 

P — M ' 

12 4 

M — J 

1 10 



M — J ' 



Possible Score Differences 
Hghst. Poss. Lwst. Poss. 

Scaled Scores Scaled Scores 



■ 160 
136 

- 45 
161 

- 22 

- 11 
15 

1 

- 2 7 

- 1 

- 15 

- 31 

- 7 

- 20 



78 
14 3 
42 

■ 83/ 
60 

■ 37 



-16 
20 
5 

- 3 

- '5 

- 4 
-28 

16 
0 

-26 
18 
-37 
-57 
-18 

30 

- i 

12 

40 
41 
10 



y 



Observed Score Differences 
Hghst. Poss. Lwst. Poss. 
Scaled Scores Scaled Scores 

157 



138 

- 47 
155 

- 28 

- 12 
• 10 

1 

- 26 

- 4 

- 14 

- 31 

- 13 

- 12 

7 
62 
24 
42 

- 52 
51 

- 30 



-13 
21 
3 

- 5 

- 2 

- 6 
-26 

13 

- 3 
-26 

15 
-37 
-57 
-IB 

30 
5 
20 
40 
36 
17 

- 3 



20 



Table 1 <:ont'd 



Test and 
Equatinj^ Forms 

Philosophy 
Cont'd ^ 



~ J 



^ P. 



10 



Possible^ Score Diffe rences 
Hghst. Pqss . Lwst, Poss/ . 
Scaled Scores 'Scaled Scores' 



-146 ' 88 

- 56 55 

37 64 

-91 * 105 



Observed Score Differences 
Hghst. Poss. Lwst;> Poss ^. 
Scaled Scorers Scaled Scores 



72 
.20 
39 
20 



64 
47 
63 
90 



^0 


- °12 


.13 


^ 40 


U6 . 


42 


^0 


- N,^ 


\2e 


- 18 


116 


-~16 


^2 




131 


10 


122 


14 


^1 




- 17 


- 32 


- 21 




^1 




- 7 


10 


- 8 


13 


'■2 




-105 


56 


- 60 




Political Science 














- ?o 


44 


- 17 




h 


- 


- 11 


32 


15 


33 




^0 


- 44 


35 


- 24 


32 






1 


23 


7 


23 


Psy chol 












^1 




- 9 


26 


— 1 


•2 3. 


°i 




- 30 


34 


- 15 ' 


33 


°i • 




40 


■3 


36 


12 




-- 4o 


0 


- 16 ■ 

> 


- 3 


-13 






10 


- 39 


- 1 


-36 




4 


55 


- ^5 


37 


-22 



21 



Table 1 ContM 

Test and 
Equating ^^o^p-s^ 

Psychology 
Cont'd 



0. 



0. 



12 



•7 - 4 
Sociolofiy 



M 



12 



12 



— J 



0. 
> 

Spanish ' 
P. r 



P 



1 



10 



— J 



10 



— M 



12 



J 



10 



— J, 



— J 



— M 



10 



12 



— J, 



10 



— M 



12 



— 



Possible Score Differences 
Hghst. Poss. Lwst. Poss-. 
Scaled Scores Scaled Scores 



Observed Score Diffe fences 
Hghst. Poss. Lwst. Poss. 
Scaled Scores Scaled Scores 



- 22 
16 

- 2 
66 

-119 
-128 

- 43 
67 
90 

- 51 
-148 

- 59, 

- 37 
73 
94 

- 40 

- 27 

- 27 

- 2 

- 27 



V - 



- 3 

- 13 

- =6 

- 23 

41 

68 
, 11 

- 29 

- 37 
37 
82 
80 

- 29 

- 63 
70 

7 



- 14 

3 
21 

- 7 



18 
12 

3 
50 

90 
81 
33 

45 
61 
33 
9 3 
9 3 
37 
30 
43 
32 

27 
24 
0 

26 , 



-14 

¥ 

~ 1 



-24 

38 
63 

- 2 

-43 

-50 
24 
42 
68 

-40 

-76 

-87 

-20 

-15"« 

0 

19 
-11 



2Z 



Table 1 ContM 



Test and 
Equati|iig Forms 




^Bossible Score Differences 
Hghs t. Poss ■ Lvst. Poss. 

Scaled Scores ^Scaled Scores 



^- 2 7 

- 29 

- 4 

- 33 



- 15 

8 
21 



Obse rved Score Differences 
Hghs t . Ppss> Lwst. ^.os&f 
StaledScores Scaled Scores 



- 28 

- 27 

- 3 
% - 33 



-13 
9 

23 
-15 



\ 



/ 

/ 

/ 

/ 



(" J ;uun. i"L>."!l i r>ns oi' Advunced .Tests by P(j'ijL^ib.W 
i^ccvvii laid Obser'.vt.Hi lycove. i)i f^f er'^jnct-.; 



No oxtrcm^. dil L'terencoi; ol' great.er than .5o Ocorc point; 
e i. ther: oxti'eme 



bA.kica 111 

' H i:.^ t,oi'y 
Lit^ratiu-e 

P') Lit lea L Sc.LtMicOj 
upani^ih • 



A d fPi 



A d vFtereiua^ of more ' hhan ,^U)^coro 

P sib 1 vj ^.n,a^'t'o;> 
Bioi og^Y 

Economir s 
toglneeriii^^ 
M'xt.|u;iaat i 



itioLog^y 
Liiucat iun 
l^re'nch 

t/ite fa tare 

Pol I i.ifal ^n' i (IIKH 

Pr.y clio l.o^';.v" 
ofvairi.rdi 



points at- only on<' 



Choin III (,ry 

Matiiuiiiat, 
Mils i r 



^ dLrti;r(:n^:t; oP 

lii i l.osophy 
i^iyvi lA^s 
u( H* Pd of^y 



Uiaii soon; [)L>int,s at.^ both o.\P 

Geo I o^^y 
Pli i..lxn!ophy 
■ ^" l^iiysies 

ooe i(^ i.o^'y 



C'W'aruH. i '.(111 Kqiiat.iiii^ '^^^^i m ha L-gu:.iut, L Li v^j hk.|U'U. i ii^.^ (Vl,)E} 



Biolog y 
(Jht 'Mi i (ay 



H i-|;]h Lii)W 



\ 



^ ) 



■1 \ 



I'*:"' :iiah 



:i 1 , a M'V 



^i ^ ^ , M 



a ) 



1- i I t a-a Lu.r 



Ma t.lu'iiia i, i 



I,' 



EKLC 



MU, : I > 



M, -~ M, 



J 

1 '1 



^5 



I /I 



III 



. aa 



'a). 



TabLu S Cunt. ' 



N. 



1 ^) 



Po^u.;LbLe Scores 
H l^h , Low 



i.9 
■1 1 



- 'b 



5^6 



\ 

Observed 'Scores^ 
High Low ' 



iU' 
Ml 



Y 

U 



U2 

- 8 

37 



V. 



mi i; 



V. 



ERIC 




If 

■ , Table 4 



Correlations for^Tegts Equated to Themselves 



■ / 



"New.'Form'J • ,, , "Old Fom" 









Rv irn 
IlA. I Vv^ 










h All 7ft 
^ U 1 04(0 




n" c;7Ai . 




n f^ ill 


n ^66)1 










n fiiV^c; 


u.p f u;) 




n 't^^iQ 

u.;j ji7 


0 5/87 






n A7)iO 






0 6n?8 


0 ^706 


0 ^^21 


0.667^ 




n An)i7 


n ^n8R 






0 6?li7 


0 Ii907 


0 61j91 






n "^68(1" 


0 6nHQ 


0 ^681 • 


0 v^'^ 


0 ^628 


0 ^681 


d.6o8o 


0.3105 
0.3270 


O.50O7 
0.ii906 


Q.o25^ 

0.6088 ' 


0.5259 
0.677ii 


U.5u21 
O.I1253 


0 . i|00 ( 
0.5922 


0.00 1 
0.6781 


0.7291 


0.3/172 


0.5320 


. 0.7307 ' 


' 0.7lili7 


0.3570 


0.5085 


0.7li5l 




0.6622 


0.6120 


0.681a 


0.3360 


0.5622 


0.5983 


0.5622 


0.6278 


0.5297 


C.6662 


0.6i.51i 


0.5I1OI 


0.ii369' 


0.ii9l|2 


0.57l^ii 


0.7281 


0.5327 


0.600il 


0.7373 


0.6611 


0.5506 


O.L982 


0.7086 


0.6318 


O.WS 


0.6239 


0.6355 


, 0.700i| 


0.'539ii 


0.5621 


0.7189 


0.6907 


0.51i49. 


0.5653 


G.7156 


•0.6297 


0.1iii6'0 


0.5112 


0.6ii60 


0.7770 


0.6577 


0.6ii96 


0.8026. ' 


0.7972 


0.6866 


0.6777 


0.8217 


0.2$9S 


0.0160 


0.ii212 


O.278I4 . 


0.3610. 


O.C675 


0.56C3 


■C.3959 



-25- 



PI 



Appendix -1 



ERIC 




29 

o 

ERIC 




37 y 

\ * 

0, ^ 4*r ' 



-59- 



Advanced Political Science 




42 




\ 



ERIC 



.k2- 



Advanced Spanish Test 




R. 



•is 



ERIC 



Appendix 2 

■Method Used for Equating GRE Advanced Tests Using Verbal and 
Quantitative Aptil^ude Test. Scores as Ancho/-' 

Suppose two differjent groups of candidates take two different test 
•forms designated as form X and form Y. We denote the group taking test X 
as group r and the group taking test Y as group s. Suppose farther that 
'test Y has been given saaetime in the past and that test X has been 
recently a,dministered and' that both groups *have taken a Verbal and Quantitative 
test denoted V and Q respectively. Thus^ a group r has scores on tests X^ ^ 
and Q and group s has scores on tests Y^ and Q . 

We call fonn^ X the '"new fonn", and form Y the "old foM" and 
desire to make scores on test'' X comparable to scores on test Y. To do this^ 
we conceptualize two regressions for each test form. For form X we consider 
the regression of the score on test X on the scores of V and Q for the group 
and do the same for the total group t = r + s^ even though the total group did 
not take test X. These two regressions are denoted by 

X = a + b, V + Q (1) 
r r Ir Ir ; 2r^r ^-^^ 

and 

h - "it ' \t\ ^V^t • , (2) 

We now make three assumptions, the first being that the slopes for the two. 
groups^ r and t> are the same^ i.e.;, • ' 

K-hJ-\A~~h-\t\-^A , (.3) 

and the second being that the regression c^' .'ficients are the same, i.e.. 
And finally the variance error of estimate, the expected squared error from 

47 ' ^ 



prediction denoted VEy 



VE = S- - bCb» 

X 



. ^ ^ where b/- (b^.b^) 

C = the covariance matrix of V and Q 

is the same Tor both groups^ 

2 P 
S - b C b' = S'- - b G b (f,) 
X r r r tit , • 

Substituting equations (h) and (5) into (3) and solving for we obtain 

and since we know all of the ternis on the right hand side of the equations 

we have an estimate of how the total group woiad have done on test. X. 

Substituting (I4) and (5) into (6) and solving for S^ we obtain 

^t 

t r 

Using exactly the same assumptions and development for the relationship 

between Foras Y and V and 3 with groups s and t we obtain estijnate'^s 
— 2 

and S_ for the mean and variance using the total group. 

The conversion of the scores on test X to the corresponding 
scores on test Y is found by 
Y = a' + b' X 
S 

wnere b ' - _± and a » =^ Y - b ' X 

The common-item approach utilizes exactly the same approach only 
using .an anchor Lest (usually commoi. items) denoted Z instead of . V and Q- 



46- 



Appendix 3 



\ 



49 



ERIC / 



ENGINEERING 




FRENpH - 




r — I 1 1 1 1 — 1 r I 1 1 ~ri^ — I 1 I T^;^ I I 1. 

. 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150' 160 170\ 180 190' 200 



ERIC 



62 



\ 



....... . • ■•■'.v 

FRENCH 




ve . .. 



ERIC 



-65- 



LITERATURE 

OEOUATINGS 




3f 'J 





ERIC 



